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Abstract. Conditional Equi-concentration of Types on /-projections (/CET) 
and Extended Gibbs Conditioning Principle (EGCP) provide an extension of 
Conditioned Weak Law of Large Numbers and of Gibbs Conditioning Princi- 
ple to the case of non-unique Relative Entropy Maximizing (REM) distribu- 
tion (aka /-projection). /CET and EGCP give a probabilistic justification to 
REM under rather general conditions, /^-projection variants of the results are 
introduced. They provide a probabilistic justification to Maximum Probabil- 
ity (MaxProb) method. 'REM/MaxEnt or MaxProb?' question is discussed 
briefly. Jeffreys Conditioning Principle is mentioned. 



1. Introduction 

Relative entropy maximization (REM/MaxEnt) is usually performed under mo- 
ment consistency constraints. The constraints define a feasible set of probability 
distributions which is convex, closed and hence the relative entropy maximizing 
distribution (aka /-projection) is unique. For such sets Conditioned Weak Law 
of Large Numbers (CWLLN) is established and provides a probabilistic justifica- 
tion of REM/MaxEnt. Gibbs conditioning principle (GCP) - a stronger version of 
CWLLN - which is as well established for such sets, gives a further insight into the 
'phenomenon' of conditional concentration of empirical measure on /-projections. 

This work strives to develop extensions of CWLLN and GCP to the case of 
non-unique /-projection 1 . Proposed Conditional Equi-concentration of Types on 
/-projections (/CET) which extends CWLLN says, informally, that types (i.e., 
empirical distributions) conditionally concentrate on each of proper /-projections 
in equal measure. Extended Gibbs conditioning principle (EGCP) states, that in 
the case of multiple proper /-projections, probability of an outcome is given by 
equal-weight mixture of proper /-projection probabilities of the outcome. 

A generalization (cf. 021) of a result on convergence of maximum/suprcmum 
probability types (/^-projections) to /-projections (cf. Thm 1) directly permits 
to state either the well-established CWLLN, GCP or their extensions equivalently 
in terms of /^-projections. The /t-projection variants of the probabilistic laws allows 
for a deeper reading than their /-projection counterparts - since the /z-laws express 
the asymptotic conditional behavior of types in terms of the asymptotically most 
probable types. They provide probabilistic justification to Maximum Probability 
(MaxProb) method. 
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Though /i-projections and /-projections are asymptotically identical, in the case 
of finite samples, they are in general different. 

2. Terminology and notation 

Let {A}™ =1 be a sequence of independently and identically distributed random 
variables with a common law (measure) on a measurable space. Let the measure 
be concentrated on m atoms from a set X = {xi,X2, ■ ■ ■ ,x m } called support or 
alphabet. Hereafter X will be assumed finite. An element of X will be called 
outcome or letter. Let qi denote the probability (measure) of i-th element of X; 
q will be called source or generator. Let P(X) be a set of all probability mass 
functions (pmf's) on X. 

A type (also called n-type, empirical measure, frequency distribution or occur- 
rence vector) induced by a sequence {X}^ is the pmf v n g P(X) whose i-th 
element is defined as: vf = rii/n where n; = X)"=i ^i-^-l = x i)> an d is the 
characteristic function. Multiplicity T(v n ) of type v n is: T(v n ) = nl/YULi n %-- 

Let IT C P(X). Let P„ denote a subset of P(X) which consists of all n- types. 

Let n„ = n n p n . 

/-projection p of q on II is p = arg inf pen I(p\\q), where I(p\\q) = J2xP* lo S ff 
is Kullback-Leibler distance, information divergence or minus relative entropy. 

7r(i/™ £ A|z/ n 6B;gn v 11 ) will denote the conditional probability that if a type 
drawn from q S P(X) belongs to B C II then it belongs to A C II. 

3. CWLLN, GlBBS CONDITIONING 

Conditioned Weak Law of Large Numbers (cf. [T], Q2], [21], [23], 0, 0, d) 

in its standard form (cf. [2]) reads: 

CWLLN. Let X be a finite set. Let II be closed, convex set which does not contain 
q. Let n — * oo. Then for e > 

lim 7r(|^" ~pi\ < t\v n <E II; g h-> v n ) = 1 for i = 1, 2, . . . , m. 

n — »oo 

CWLLN says that if types are confined to a closed, convex set II then they 
asymptotically conditionally concentrate on the /-projection p of the source of 
types q on the set LI (i.e., informally, on the probability distribution from II which 
has the highest value of the relative entropy with respect to the source q). 

Gibbs conditioning principle (GCP) says, very informally, that if the source q is 
confined to produce sequences which lead to types in a convex, clsoed set II then 
elements of any such sequence (of fixed length t) behave asymptotically condition- 
ally as if they were drawn identically and independently from the /-projection of q 
on II - provided that the last is unique (among other things). 

GCP. Let X be a finite set. Let II be closed, convex set which does not contain q. 
Let n — > oo. Then for a fixed t 

i 

lim tt(Ai = Xt,...,X t = x t \v n £li;q^ v n ) = Y\p Xl . 

1=1 

GCP was developed at |2| under the name of conditional quasi-independence of 
outcomes. Later on, it was brought into more abstract form in large deviations 
literature, where it also obtained the GCP name (cf. ^H])- A simple proof of 
GCP can be found at 0]. GCP is proven also for continuous alphabet (cf. (T3|, 0], 
0)- 
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4. The case of several /-projections 

What happens when II admits multiple /-projections? Do the conditional con- 
centration of types happens on them? If yes, do types concentrate on each of them? 
If yes, what is the proportion? How does GCP extend to the case of multiple I- 
projections? 

4.1. Conditional Equi-concentration of Types on /-projections. Let d(a,b) 
= \ a i ~ bi\ be the total variation metric (or any other equivalent metric) on 
the set of probability distributions P(X). Let B(a,e) denote an e-ball - defined by 
the metric d - which is centered at a £ P(X). 

An /-projection p of q on II will be called proper if p is not an isolated point of 

n. 

/CET. Let X be a finite set. Let II be such that it admits k proper I -projections 
p 1 ,p 2 , . . . ,p k of q. Let e > be such that for j — 1, 2, . . . , k fp is the only proper 
I -projection of q on II in the ball B(jP , e). Let n — * oo. Then 

(1) 7r(i/ B GB( es ^')|i/ n en ;9 ^i/")=l/k for j = 1,2,..., k. 

/CET 2 states that if a set LT admits several /-projections then the conditional 
measure is spread among the proper /-projections equally. In less formal words: if 
a random generator (i.e., q) is confined to produce types in II then, as n gets large, 
the generator 'hides itself equally likely behind any of its proper /-projections on LT. 
Yet in other (statistical physics) words: each of the equilibrium points (i.e., proper 
/-projections) is asymptotically conditionally equally probable. The conditional 
equi-concentration of types 'phenomenon' resembles Thermodynamic coexistence 
of phases (e.g., triple point of water, vapor and ice). 

Notes. 1) On an /-projection p which is not rational and at the same time it is an 
isolated point no conditional concentration of types happens. However, if the set 
n is such that an /-projection p of q on it is rational and at the same time it is an 
isolated point, then types can concentrate on it. 2) Since X is finite, k is finite. 

Weak Law of Large Numbers is special - unconditional - case of CWLLN. CWLLN 
itself is just a special - unique proper /-projection - case of /CET. 

Two illustrative examples of the Conditional Equi-concentration of Types on 
/-projections (/CET) can be found at the exploratory study There also As- 
ymptotic Equiprobability of /-projections - a precursor to /CET - was formulated. 

4.2. Extended Gibbs conditioning principle. 

EGCP. Let X be a finite set. Let li be such that it admits k proper I -projections 
p 1 ,p 2 , ■ ■ ■ ,p k of q on n. Then for a fixed t: 

k t 

(2) lim n(X 1 = Xl , . . . , X t = x t \v n e II; q ~ v n ) = 1/k^ JJ^ . 

rw °° j=i i=i 

EGCP, for t = 1, says that the conditional probability of a letter is asymptotically 
given by the equal-weight mixture of proper /-projection probabilities of the letter. 
For a general sequence, EGCP states that the conditional probability of a sequence 
is asymptotically equal to the mixture of joint probability distributions. Each of 
the k joint distributions is such as if the sequence was iid distributed according to 
a proper /-projection. 



2 See Appendix for a proof of 7CET and EGCP. 
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5. /i-PROJECTiONS, Maximum Probability method 

/i-projection £> n of q on P±„ ^ is denned as: v n = arg sup„« enn 7r(z/ n ; q), 
where Tr(i> n ;q) = T(y n ) Y\{li) nVi i ( c f- ED- Alternatively, the /i-projection can be 
defined as v n = arg sup„,i grlji ix{v n \v n £ U n ;q), where ■n{v n \v n £ II n ;g) denotes 
the conditional probability that if an n-type belongs to Tl n then it is just the type 
v n . Yet another equivalent definition - a bayesian one - of /i-projection can be 
adapted from [Tu] . 

Concept of /i-projection is associated with the Maximum Probability method 
(cf. 0). ' 

5.1. Asymptotic identity of /i-projections and /-projections. At ([S], Thm 
1 and its Corollary, aka MaxProb/MaxEnt Thm) it was shown that maximum prob- 
ability type converges to /-projection; provided that n is defined by a differentiablc 
constraints. A more general result which states asymptotic identity of /^-projections 
and /-projections was presented at It will be recalled here. 

MaxProb/MaxEnt. ^2] Let X be a finite set. Let M„ be set of all ^-projections 
of q on n„. Let I be set of all L -projections of q on II. For n — > oo, M„ = I. 

Since ir(v n ;q) is defined for v n £ Q m , /i-projection can be defined only for 
II n when n is finite. The Theorem permits to define a /i-projection v also on II: 
v = argsup r . gn — Y^TL i r il°6^S Thus /i-projections and /-projections on II are 
undistinguishable . 

It is worth highlighting that for a finite n, /i-projections and /-projections of q 
on II„ are in general different. This explains why /i-form of the probabilistic laws 
deserves to be stated separately of the /-form; though formally they are undistin- 
guishable. Thus, MaxProb/MaxEnt Thm (in its new and to a smaller extent also 
in its old version) permits directly to state /i-projection variants of CWLLN, GCP, 
/CET and EGCP: /xCWLLN, /iGCP, /iCET and Boltzmann Conditioning Principle 
(BCP). 

5.2. /i-form of CWLLN and GCP. 

/iCWLLN. /eiX be a finite set. Let II be closed, convex set which does not contain 
q. Let n — > oo. Then for e > 

lim ttQv? -Pi\< e\v n £ II; q ^ v n ) = 1 for i = 1,2, ... ,m. 

n — >oo 

Core of /iCWLLN can be loosely expressed as: types, when confined to a convex, 
closed set II, conditionally concentrate on the asymptotically most probable type v . 
It is worth a comparison with the reading of the /-projection variant of CWLLN 
(see Sect. 3). 

Similarly, to the GCP its //-variant exists: 

/iGCP. Let X be a finite set. Let II be closed, convex set which does not contain 
q. Let n — > oo. Then for a fixed t 

i 

lim Trpfi =Xi,...,X t = x t \v n £ II; q i ► v n ) = TT f> 

n — .oo A 

1=1 

5.3. Conditional Equi-concentration of Types on /i-projections. A /i-pro- 
jection v of q on II will be called proper if £> is not an isolated point of II. 

/iCET. Let X be a finite set. Let there be k proper ^-projections v 1 , v 2 , . . . , z) k of 
q on II. Let e > be such that for j = 1, 2, . . . ,k P J is the only proper ^-projection 
of q onH in the ball B(v J ,e). Let n — > oo. Then 

(3) 7r(i/ n £B{e,v j )\v n £ll;q^v n ) = 1/k for j = 1, 2, . . . , k. 
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5.4. Boltzmann conditioning principle. 

BCP. Let Xfed finite set. Let there be k proper p-projections v , v 2 , . . . , P k 0/(7 
on II. ITien for a fixed t: 

k t 

(4) lim ff (l 1 =xi,...,Xt=s; f Ken; ? « ! / B ) - 1/kV Y[H «■ 

J = l i=l 

5.5. MaxEnt or MaxProb? /i-projections and /-projections are asymptotically 
indistinguishable (recall MaxProb/MaxEnt Thm, Sect. 5.1). In plain words: for 
n — ► oo REM/MaxEnt selects the same distribution(s) as MaxProb (in its more 
general form which instead of the maximum probable types selects supremum- 
probable /i-projections). This result (in the older form, [Hj) was at [H] interpreted 
as saying that REM/MaxEnt can be viewed as an asymptotic instance of the simple 
and self-evident Maximum Probability method. 

Alternatively, suggests to view REM/MaxEnt as a separate method and 
hence to read the MaxProb/MaxEnt Thm as claiming that REM/MaxEnt asymp- 
totically coincides with MaxProb. If one adopts this interesting and legitimate view 
then it is necessary to face the fact that if n is finite, the two methods in general 
differ. 

6. Jeffreys conditioning mentioned 

Instead of Summary (which is already presented at Sect. 1), Conditional Equi- 
concentration of types on J-projections ( JCET) and Jeffreys conditioning principle 3 
(JCP) will be mentioned, in passing. 

7-projection v n of q G Q m on il„ is: v n = argsup jy „ eriri 7r(j/ 1 ; q) n(q; v n ). J- 
projection (or Jeffreys projection) p of q G Q m on n is p = arg inf pe n J2iLi Pi l°g 
qi log f-. 

Let q G Q m . 7r(^" G A\v n G B; (q h-> v n ) A (v n i-> q)) will denote the conditional 
probability that if a type - which was drawn from q G P(X) and was at the same 
time used as a source of the type q - belongs to B C II then it belongs to A C II. 
A J-projection p of q on II will be called proper if it is not isolated point of II. 

JCET. Let X be a finite set. Let q G Q m . Let there be k proper J-projections 
p 1 ,p 2 , . . . ,p k of q on il. Let e > be such that for j = l,2,...,kj? is the only 
proper J-projection of q on H in the ball B{p° ,e). Let no be denominator of the 
smallest coTfvnioTi divisov of Q\ : Q2-> • • • ; Qm • Let n — utiq, u € N. Let u — > oo . Then 

(5) n(v n £B(e,p)\is n eIL;(q»is n )A(v n ^q)) = l/k for j = 1, 2, . . . , k. 

In words, types which were 'emitted' from q and were at the same time used as 
a source of g-types, conditionally equi-concentrate on J-projections of q on II. 

J-projections and 7-projections asymptotically coincide (cf. |12| . and |7| for 
an example). Hence, a 7-projection alternative of JCET is valid as well. It says 
that: types which were 'emitted' from q and were at the same time used as a 
source of g-types, conditionally equi-concentrate on those of them which have the 
highest/supremal value of 7r(i/™; q) n(nq; v 11 ). - Similarly, JCP can be considered in 
its J- or 7-form. 

/i-projection is based on the probability ir(i> n ;q); thus it can be viewed as a 
U ./V i-projection. 7-projection is based on Tr(v n ; q) ir(nq; v n ), thus it can be viewed 
as A/VZ?-projection. It is possible to consider also an Oi?-projection defined as 
i> n = argsup^ngn^ ir(i> n ; q) + 7r(ng;j/ n ). However, there seems to be no obvious 



'It should not be confused with Jeffrey principle of updating subjective probability. 
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analytic way how to define its asymptotic form. Despite that, it is possible to 
expect that OR-type of CWLLN/CET holds. 

The /i-, 7-, Oi?-projection CET can be summarized by a (bold) statement: types 
conditionally equi-concentrate on those which are asymptotically the most probable. 

Acknowledgments Supported by VEGA 1/0264/03. It is a pleasure to thank 
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7. Appendix 

A sketch of proof of I CET. 

(6) ir(v n eB(e,p>)\v n eU;q^v n ) < 



B n (e,p>) = B{e,p>)f\U. n . Let there be k B , n /-projections p%^y% • • • ,P B B 'n of 
q on Uj=i B n (e,fP). Let k^ n denote the number of /-projections of q on B n (e,p>). 

Pb n wm stand for any of such /-projections. Denote the set B n (e,p> n )\ Ui=i"{PB „} 
as B\k Bn . 

Similarly, let there be kn,n /-projections p\ „,_p n „, . . . ,Pn n „ OI " 9 on ^n- Denote 
the set n„\Ul£i n {Pn |n } as n\k n ,„. The MaxProb/MaxEnt Thm implies that for 
n — > oo the RHS of (6) can be written as: 

nipB^l) ( k B.„ + 

(7) 



-i 7r(v n :q) 



7r(p n ,n; «) ^k n ,„ + ^ ^ J 



Recall a standard inequality: 
Lemma. Let v n , v a be two types from Tl n . Then 

■n{v n ;q) ( n \ m -f-j \ v ») 



7r(i>";g) VmV ^ (f ) nl> " 

The Lemma implies that the ratio in the nominator of (7) converges to zero 
as n — > oo. The same implication holds for the ratio in the denominator. p B n 
converges in the metric to p 3 , hence k B n converges to 1 as n — > oo. Similarly, kn, n 

converges to k and T (p^'"! g ) converges to 1 as n goes to infinity. This taken together 
implies that the RHS of (6) converges to 1/k as n — > oo. The inequality (6) thus 
turns into equality. □ 

A sketch of proof of EGCP. 
(8) 

iv v i n tt «n E^en 71 "^! =xi,...,X t =x t ,v n ) 
tt{X 1 = x u ...,X t =x t \v n ell;q^is n ) = en — — — 

L^v n GS-' K \ V ill 

Partition Ii n into n\kn, ra and the rest, which will be denoted by IJpn.n- The 
MaxProb/MaxEnt Thm implies that for n — > oo the RHS of (8) can be written as: 

(9) 

E^gjpn.^^^ 1 =xi,...,X t = x u v n ) +E^gn\k n ,„ 7r ( X i =xi,...,X t = x t ,v n ) 
ir{p n y : q)(k n , n + ^~g) ) 
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By the Lemma, the ratio in the denominator of (9) converges to zero as n goes 
to infinity. The second term in the nominator as well goes to zero as n — > oo 
(to see this, express the joint probability tt(Xi = x\, . . . , X t — Xt,v n ) as tt(Xi = 
xi, . . . , Xt = Xt\v n )-K(y n \ q) and employ the Lemma). Thus, for n — > oo the RHS 
of (8) becomes 1/k Y2j=i 7r (^i = x i:---i^t = ^t|p : ')- Finally, invoke Csiszar's 
'urn argument' (cf. jjj) to conclude that the asymptotic form of the RHS of (8) is 
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8. Changes wrt the Version 3 

Three major changes: 1) Definition of proper /-projection has been changed. 2) 
An argument preceding Eq. (7) at the proof of 7CET (and similarly Eq. (9) at the 
proof of EGCP) is now correctly stated. 3) Abstract was rewritten to better reflect 
contents of paper. 

This is the definitive form of the work. To appear at the Proceedings of Max- 
Ent'04 workshop. 
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